Skip to content

Add Traefik exposure support#827

Merged
tomach merged 1 commit intomasterfrom
ta/traefik-exposure
May 8, 2026
Merged

Add Traefik exposure support#827
tomach merged 1 commit intomasterfrom
ta/traefik-exposure

Conversation

@tomach
Copy link
Copy Markdown
Contributor

@tomach tomach commented Apr 16, 2026

Summary

This introduces spec.cluster.exposure to optionally expose CrateDB clusters via Traefik (IngressRouteTCP) instead of a LoadBalancer. This reduces load balancer quota usage (e.g., on AWS).

Changes

CRD

  • Added exposure enum field (loadbalancer | traefik). Defaults to loadbalancer in the operator.

Service creation

  • When exposure: traefik, the operator creates a ClusterIP service.
  • Cloud‑specific annotations (aws-load-balancer-*, azure-load-balancer-*) are only added for loadbalancer.

Traefik resources (for exposure: traefik)

  • MiddlewareTCP - created only if allowedCIDRs is non‑empty (IP allowlist).
  • IngressRouteTCP (ports 4200 & 5432) - reference the middleware when it exists.

CIDR updates

  • When allowedCIDRs changes, the operator updates the Traefik middleware accordingly and adjusts the IngressRouteTCP routes to add/remove the middleware reference.

Exposure changes

  • Changing exposure from loadbalancer <> traefik patches the existing service
  • Traefik resources are created or deleted as needed.

Suspend / Resume

  • Suspending (scaling to 0) deletes Traefik resources and the ClusterIP service.
  • Resuming (scaling back to 1) recreates them using the current spec.

RBAC

  • Added permissions for traefik.io/middlewaretcps and ingressroutetcps (create, get, list, watch, patch, delete).

Backward Compatibility

  • Existing clusters without the exposure field continue using loadbalancer - no breaking change.

Checklist

  • Link to issue this PR refers to: https://github.com/crate/cloud/issues/2893
  • Relevant changes are reflected in CHANGES.rst
  • Added or changed code is covered by tests
  • Documentation has been updated if necessary
  • Changed code does not contain any breaking changes (or this is a major version change)

Comment thread crate/operator/handlers/handle_create_cratedb.py
@tomach tomach force-pushed the ta/traefik-exposure branch 3 times, most recently from 7e68cd7 to f7df78f Compare April 28, 2026 07:40
@tomach tomach marked this pull request as ready for review April 28, 2026 09:06
@tomach tomach requested review from juanpardo and plaharanne April 28, 2026 09:09
@goat-ssh
Copy link
Copy Markdown
Contributor

goat-ssh commented Apr 29, 2026

I've deployed on Dev and tried converting a DB -> ✅
Added single IP Whitelist -> ✅
Adding 2nd IP -> ⛔

cc @tomach :
Screenshot 2026-04-29 at 20 39 24
Screenshot 2026-04-29 at 20 36 49

Logs

[2026-04-29 17:33:02,513] kopf.objects [INFO ] [0fdb256a-cb6b-44b2-b97f-fc8be6949d95/3c48548b-47a5-44f9-ba7c-4ffd4065e1e2] Patching MiddlewareTCP cratedb-allow-3c48548b-47a5-44f9-ba7c-4ffd4065e1e2 with new CIDRs ['5.32.131.18/32', '213.222.49.221/32'] |  

[2026-04-29 17:33:02,555] kopf.objects [ERROR ] [0fdb256a-cb6b-44b2-b97f-fc8be6949d95/3c48548b-47a5-44f9-ba7c-4ffd4065e1e2] Handler 'service_cidr_changes/spec.cluster.allowedCIDRs' failed with an exception. Will retry. |  
[2026-04-29 17:33:02,557] kopf.objects [DEBUG ] [0fdb256a-cb6b-44b2-b97f-fc8be6949d95/3c48548b-47a5-44f9-ba7c-4ffd4065e1e2] Patching with: {'metadata': {'annotations': {'operator.cloud.crate.io/service_cidr_changes.spec.cluster.allowedCIDRs': '{"started":"2026-04-29T17:32:00.714634+00:00","delayed":"2026-04-29T17:34:02.557071+00:00","purpose":"update","retries":2,"success":false,"failure":false,"message":"(400)\nReason: Bad Request\nHTTP response headers: <CIMultiDictProxy('Audit-Id': '912608ae-80b4-4585-af32-f318462d74be', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '98745b5d-9067-40d7-a501-1d3863d6619c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '93fa3e08-7f26-44f4-982d-26db0175817b', 'Date': 'Wed, 29 Apr 2026 17:33:02 GMT', 'Content-Length': '211')>\nHTTP response body: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"error decoding patch: json: cannot unmarshal object into Go value of type []handlers.jsonPatchOp\",\"reason\":\"BadRequest\",\"code\":400}\n\n"}', 'operator.cloud.crate.io/touch-dummy': None}}}

Name:                        kopf-event-bpd4q
Namespace:                   0fdb256a-cb6b-44b2-b97f-fc8be6949d95
Labels:                      <none>
Annotations:                 <none>
Action:                      Action?
API Version:                 events.k8s.io/v1
Deprecated First Timestamp:  2026-04-29T18:11:34Z
Deprecated Last Timestamp:   2026-04-29T18:11:34Z
Deprecated Source:
  Component:  kopf
Event Time:   2026-04-29T18:11:34.597754Z
Kind:         Event
Metadata:
  Creation Timestamp:  2026-04-29T18:11:34Z
  Generate Name:       kopf-event-
  Resource Version:    51139219
  UID:                 f7c1b8be-0207-4ad7-b24d-542c81b7c06e
Note:                  Handler 'service_cidr_changes/spec.cluster.allowedCIDRs' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kopf/_core/actions/execution.py", line 276, in execute_handler_once
    result = await invoke_handler(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kopf/_core/actions/execution.py", line 371, in invoke_handler
    result = await invocation.invoke(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/...': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '98745b5d-9067-40d7-a501-1d3863d6619c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '93fa3e08-7f26-4
4f4-982d-26db0175817b', 'Date': 'Wed, 29 Apr 2026 18:11:34 GMT', 'Content-Length': '211')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error decoding patch: json: cannot unmarshal object into Go value of type []handlers.jsonPatchOp","r
eason":"BadRequest","code":400}


Reason:  Logging
Regarding:
  API Version:         cloud.crate.io/v1
  Kind:                CrateDB
  Name:                3c48548b-47a5-44f9-ba7c-4ffd4065e1e2
  Namespace:           0fdb256a-cb6b-44b2-b97f-fc8be6949d95
  UID:                 2a953c83-68f9-402e-b448-7fbc624ddb0a
Reporting Controller:  kopf
Reporting Instance:    dev
Type:                  Error
Events:                <none>

@tomach
Copy link
Copy Markdown
Contributor Author

tomach commented Apr 30, 2026

Adding 2nd IP -> ⛔

@goat-ssh thanks for catching this! The issue was in how I was patching the MiddlewareTCP resource, I didn't explicitly set the patch content type. That's why the first IP worked (it went through the create path) but the second one failed (patch path). Fixed in 4d5353a

Copy link
Copy Markdown
Contributor

@plaharanne plaharanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, however it is probably safer if @goat-ssh and/or @WalBeh review and approve it as well.

@tomach tomach force-pushed the ta/traefik-exposure branch from 308e27c to 999d796 Compare May 8, 2026 07:50
@tomach tomach merged commit 7c660d1 into master May 8, 2026
16 checks passed
@tomach tomach deleted the ta/traefik-exposure branch May 8, 2026 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants